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Abstract 

We pursue a systematic study of the following problem. Let / : {0, 1}™ — > {0, 1} be a 
(usually monotone) boolean function whose behaviour is well understood when the input 
bits are identically independently distributed. What can be said about the behaviour of the 
function when the input bits are not completely independent, but only fc-wise independent, 
i.e. every subset of k bits is independent? more precisely, how high should k be so that 
any k-wise independent distribution "fools" the function, i.e. causes it to behave nearly the 
same as when the bits are completely independent? 

In this paper, we are mainly interested in asymptotic results about monotone functions 
which exhibit sharp thresholds, i.e. there is a critical probability, p c , such that P(f = 1) 
under the completely independent distribution with marginal p, makes a sharp transition, 
from being close to to being close to 1, in the vicinity of p c . For such (sequences of) 
functions we define 2 notions of " fooling" : K\ is the independence needed in order to force 
the existence of the sharp threshold (which must then be at p c ). K2 is the independence 
needed to "fool" the function at p c . 

In order to answer these questions, we explore the extremal properties of fc-wise indepen- 
dent distributions and provide ways of constructing such distributions. These constructions 
are connected to linear error correcting codes. 

We also utilize duality theory and show that for the function / to behave (almost) 
the same under all /c-wise independent inputs is equivalent to the function / being well 
approximated by a real polynomial in a certain fashion. This type of approximation is 
stronger than approximation in L±. 

We analyze several well known boolean functions (including AND, Majority, Tribes and 
Percolation among others), some of which turn out to have surprising properties with respect 
to these questions. 

In some of our results we use tools from the theory of the classical moment problem, 
seemingly for the first time in this subject, to shed light on these questions. 
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1 Introduction 



Let / : {0, l} n —> {0, 1} be a boolean function whose behaviour is well understood when the 
input bits are independent and identically distributed, with probability p for each bit to be 
1. As an example we may consider the majority function, Maj, whose output is the bit which 
occurs more in the input (suppose that n is odd). When p = 1/2 we know that the output is 
also distributed uniformly. When p < 1/2 the output tends to be 0. More precisely, if p < 1/2 
is constant, the probability of Maj = 1 decays exponentially fast with n. 

Suppose, however, that the input bits are not truly IID . For example, they might be the 
result of a derandomization procedure. A reasonable, but weaker assumption would be that 
the probability of each bit to be 1 is still p, and that they are &;-wise independent, i.e. the 
distribution of any k of the bits is independent. 

Under this assumption, what can be said about the distribution of /? For fixed p, which k 
(as a function of n) is enough to guarantee the same asymptotic behaviour? Majority turns out 
to be relatively easy to analyze: k = 2 is enough to guarantee that for any fixed p < 1/2, the 
probability of Maj = 1 tends to (though only polynomially fast), while for p = 1/2, we have 
P(Maj = 1) guaranteed to tend to 1/2 if and only if k = w(l) ("Guarantee" here means that 
Maj behaves as prescribed under any fc-wise independent distribution). In fact, for p = 1/2 we 
have more precise results, that |P(Maj = 1) — 1/2| < 0(l/\/fc) under any fc-wise independent 
distribution. As can been seen, the k needed to "fool" majority at p ^ p c (which we denote 
K\) is much smaller then the k needed to "fool" majority at p c ( which we denote K2). This 
phenomenon is shared by the other functions we explore, and we provide a partial explanation. 
Other functions exhibit much more complex behaviour and the required analysis is accordingly 
complex. We pursue a systematic study of the above question. 

fc-wise independent distributions are often used in computer science for derandomization of 
algorithms. This was initiated by the papers [2J, [T3], [25], [3T] and further developed in [TU] , 
[36] , [32] , [H] , [26] , [27] and others (see [33] for a survey) . For derandomization one checks that 
the algorithm still behaves (about) the same on a particular fc-wise independent input as in the 
completely independent case. The question we ask is of the same flavor, for a given boolean 
function /, we ask how much independence is required for it to behave about the same on all 
A;- wise independent inputs (including the completely independent one). 

Typically, /c-wise independent distributions are constructed by sampling a uniform point of 
a small sample space, which is usually also a linear subspace ([21], [23], [H])- In this work, 
like in the works of [26] . [27] . we do not impose this restriction and consider general /c-wise 
independent distributions. Still, our work is of interest even for the reader only interested in 
the more restrictive model since, on the one hand, anything we show is impossible would still 
be impossible in that model and on the other hand almost all of our constructions are of the 
linear subspace type. Interestingly, in section 14.81 we give an example where the general and 
more restrictive case give asymptotically different results, i.e., that the general distribution case 
is richer, not just up to constants, in what can be achieved with it. 

The tools we use include the duality of linear programming, in section 13. 1] used to show an 
equivalence between our question and the question of approximating the function / by a real 
polynomial in a certain "sandwich L\ approximation (stronger than ordinary L\ approxima- 
tion). This connects our results to the subject of approximation of boolean functions, used for 
example in learning theory (e.g. [29], [38], [5], [4"0]). 

In section 13.21 we recall a theorem about weak convergence of distributions, later used to 
give sharp bounds on K2 very easily. In section [3731 we introduce a tool from the Theory of the 
Classical Moment Problem (TCMP), seemingly for the first time in this context. In sections 
14.81 and 14.91 we use it to prove bounds on the maximal and minimal probabilities of all bits to 
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be 1 under a /c-wise independent distribution, in a simple way. We then observe that if p = ^ 
for a prime-power q, then an upper bound on this maximal probability translates to a lower 
bound on the size of a symmetric sample space for /c-wise independent GF{q) -valued random 
variables, we apply our upper bound to obtain new lower bounds for such sample spaces. For 
the binary case q = 2 our bound equals the well-known bound of [2], |13j . 

In section Fj] we explore K\ and K% for various boolean functions, and also prove some 
general theorems. In section FPU we present a novel construction of a distribution (of the linear 
subspace type) designed to change the behaviour of a particular function. We use a variation 
of the (u I u + v) construction of error-correcting codes [3l] and we would like to emphasize 
the technique used there. We think there is a shortage of ways to construct /c-wise independent 
distributions with specified properties and that this technique will be useful for changing the 
behaviour of other functions as well. 

The approach in this paper is a little different than that usually taken in pseudo-random 
generators (see [32]). There one seeks a distribution under which all functions from a certain 
complexity class behave the same as on fully independent bits. In contrast, we start with a 
function / and wish to show that it behaves the same on all fc-wise independent inputs. Still, 
one may expect this to hold if the function / is "simple enough". Indeed, a conjecture of Linial 
and Nisan [30] makes this precise when / is a function from the class ACO. In section 14.31 we 
recall the precise conjecture and make some modest progress towards confirming it. 

There are other notions of "simple functions". Another such notion is that the function be 
noise stable [JJ, i.e., having most of its Fourier mass on constant level coefficients. In section |4"7T1 
we show a connection between the Fourier spectrum and the behaviour on /c-wise independent 
inputs, but surprisingly show that a noise stable function can behave very differently on /c-wise 
independent inputs than on fully independent inputs even when k grows fast with n. 

There is also a lot of interest in almost k-wise independent distributions ( [37] , pj] , [HJ , [B] , pJ2] , [IB] ) , 
though our questions can equally be formulated for that case, in our work we concentrate only 
on perfect /c-wise independence, this is both because it seems the analysis is simpler for perfect 
/c-wise independence and they could serve as a starting point for further research and because 
we think the perfect /c-wise independent case is interesting on its own. 

2 Basic definitions and properties 

We begin with a definition 

Definition 1 Let A(n,k,p) be the set of all k-wise independent distributions Q on n bits 
(Ai, . . . , X n ) with Q(Xi = l)=p for all i. 

Also denote by ¥ p the fully independent distribution on n bits, each with probability p to be 1. 
In most of the sequel we will be concerned with understanding 



for a boolean function / : {0, 1}™ — > {0, 1} and given n, k and p. We first note that A(n, k,p) is 
a convex set, since, if the distribution on a subset of k bits is independent with marginal p in 
both Qx and Q2 then it is so also in aQi + (1 — a)Q2- 

This implies that the extremal values in ([1]) are attained at extreme points of A(n,k,p), 
hence if we could only find all these extreme points we could then find the values (TTJ) for all /. 
Unfortunately saying anything about these extreme points appears to be very difficult and so 
in the sequel we will need to resort to special methods for each function / considered. 




and 




(1) 
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For later reference, we identify the two extreme points of A(n,n — 1,5). XORO is the 
distribution on {X\, . . . ,X n ) having {Xi}™~i IID and X n = ^7=1 Xi mod 2, and XOR1 is the 
same with X n = 1 + J27=i mod 2. 

We next define precisely what we mean by "k large enough so that / behaves on all A:-wise 
independent inputs the same as on the fully independent input" . 

Definition 2 e f (k,p) = max<Q eA ( n ,k,p) Q(f = 1) - ^^Q&A(n,k, P ) Q(f = 1) 
k-f(e,p) is the minimal k such that e^(k,p) < e. 

We will be mostly interested in asymptotic (in n) results. Let /„ : {0, l} n — > {0, 1} be a 
sequence of monotone boolean functions. Assume that the functions have a sharp threshold, 
i.e. there is a p c such that lim n _ ) . 00 P p (/ = 1) is if p < p c , 1 if p > p c . 

For example, any sequence of balanced monotone transitive functions has a sharp threshold, 
as is proved by Friedgut and Kalai [TT] . 

Definition 3 K\ is the class of functions k(n), such that e(k,p) — > for any p 7^ p c . 
K2 is the class of functions k{n), such that e(k,p c ) — > 0. 

In other words, ifi-wise independence is enough to guarantee the existence of sharp threshold 
(which is then necessarily at p c ), while i^-wise independence is enough to guarantee that / 
behaves as if the bits were completely independent, when p = p c . 

Notice that while K\ and K2 are classes of functions, we occasionally abuse the formal 
notation, and write, as above, JTi-wise independence. Similarly, we write K\ > k(n) to indicate 
that k(n) does not belong to K%, or K2 < o;(l) to indicate K2 D w(l), etc. 

It is not a-priori clear whether K\ < K2 or vice versa (or neither). Consult the appendix 
for a partial result. In all the examples we encountered K2 is at least u(Ki). 

3 General tools 

In this section we discuss some general tools for finding K\ and K2 as defined in the previous 
section. 

3.1 Duality - Approximation by polynomials 

We note that the values ([T]) are the solution to a simple linear program. What is the dual of 
this program? We observe 

Proposition 4 For any f : {0, 1}™ — > {0, 1}, any k and any < p < 1. 

max Q(/=l)= min E Vp P(Xi,...,X n ) 

Q£A(n,k,p) P6P+(/) 

(2) 

min Q(/ = l)= max E ¥ P(X 1 , . . . , X n ) 

QeA(n,k,p) P^Pk(f) 

where PjT(f) is the set of all real polynomials P : M. n — > R of degree not more than k satisfying 
P > f on all points of the boolean cube. P^ is defined analogously with P < f . 

The proof is simple using linear programming duality. We deduce that e-F(k,p) < e is equivalent 
to having two polynomials P + > f and P~ < f of degree not more than k with Ep p (P + — P~ ) < 
e. We call this type of approximation of / a "sandwich L\" approximation. In section I4T71 we 
show that it is strictly stronger than L\ approximation (by real polynomials of degree not more 
than k). Whether it is stronger than L2 approximation is one of our main open questions. 
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3.2 Distributions determined by their moments 

Definition 5 We say that a real random variable X has distribution determined by its mo- 
ments if any random variable Y satisfying KX m = EY m for all integer m > 1 has the same 
distribution as X. 

We shall often use the following principle 

Proposition 6 Suppose a sequence of RV's {X n } n satisfies for all m, EX™ — >• EX m for some 
RV X whose distribution is determined by its moments. Then X n — > X in the weak sense. 

For the proof, see [15], section 2.3 . We remark that a distribution is determined by its moments 
whenever these do not grow too fast. The best criterion is called Carleman's condition (see |15|). 
But for our purposes it will mostly be enough to know that the Normal and Poisson distributions 
are determined by their moments. 

3.3 Bounds from the classical moment problem 

Given a real sequence S := {s m }^ =0 , with k even and so = 1 , let 

As = {Q | Q a probability distribution on R, s m = E Q (X m ) for < m < k} (3) 

be all probability distributions with these first k moments (X is a random variable distributed 
according to Q). In the theory of the classical moment problem [I], [28], based on S a certain 
sequence of real functions p m is defined and the following theorem is proved 

Theorem 7 [1, 2.5.2 and 2.5.4] F° r an V x an d an U Qi 5 Q2 £ As 

\Qi(X<x)-Q 2 (X <x)\< Pk (x) (4) 

2 

and in particular by taking Qi = Q2 we get maxQ g _4 s Q(X = x) < pk{x). 

For brevity we do not give the general definitions of p m here but differ them to the appendix. In 
the case of interest for us s m := E(X m ) when X ~ Bin(n,p) and then p m (x) := (^jlo Pji x )) 
where {Pj}j are the (normalized) Krawtchouk polynomials (see [43J). These polynomials are 
very well known and from them we easily deduce the following (see appendix for a proof) 

p n 

Pm{ - n) = P(Bin(n,l-p) <m) (5) 

,n 2 1 tx . . 

Pm{-^> — ~f= t° r P = 77) even n and even m < — (6) 

2 Jm 2 2 



In many cases, the theory also has constructions achieving the bound of theorem [71 However, 
these are not necessarily supported by the integers, which we require. It might be that they 
can be suitably modified to give sharp results in our cases. 

4 Boolean functions 

In this section we investigate K% and K2 for several boolean functions, and also present some 
general theorems. We start with a simple, but already non-trivial example. 
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4.1 Majority 



Let Maj n be the majority function on n bits (for odd n). Let S n = Y^!i=i x ii where x% are 
the input bits. Let S n = (2S n — n)/y/n. The central limit theorem implies that under P1/2, 
S n —¥ N(0, 1). Identifying K\ is easy 

Theorem 8 Ki(Maj) = 2. 

Proof. Obviously, k = 1 is not in K\. However, for Q G ,A(n, 2,p) we have Eq^S^) = np and 
VarQ>(S n ) = rap(l — p). If, WLOG, p < 1/2 then by Chebyshev's inequality 

Q(S n > n/2) < ®((S n - np) > n(l/2 - p)) < ?f /o ~ ^^2 = °(~) ^ ■ 

(n(l/ 2 — p)Y n 

Identifying K2 is harder. The ideas of section [3721 give the following 
Proposition 9 2f 2 (Maj) < w(l) 

Proof. Consider the distribution of 5 n under some Q G ^4(n, fc, 1/2). Obviously, EQ(S l n ) = 
Ef> 1/2 {S\ l ) for any I < k. The same holds for S n as it is a linear function of S n . Therefore, 

Eq n (Sn) si where si = E(iV(0, 1)') is the Z-th moment of a standard normal distribution. 
The normal distribution is determined by its moments. Hence, if k(n) G w(l) and Q n G 
,4(n, fc(n), 1/2) then S n — > N(0, 1) weakly by proposition [6l In particular, Q n (Maj n = 1) = 
Q n (^> 0)^1/2. ■ 

In fact for Maj we can be much more specific. 

Theorem 10 There exists a C > such that for any even 2 < k < n 

( < ma Xi |Q(Maj n = l)-i|<^ (7) 



Vklogk QeA(n,k±) n 2 ^/k 

And when Qo G A(n,n — 1, ^) is the XOR0 distribution we have |Qo(Maj n = 1) — ^| > 

The theorem implies that K2 = w(l), but is much stronger in that it bounds e Ma i(k, |). 

Proof. The claim about XOR0 is easy to verify directly. The lower bound comes from a 
direct construction sketched in the appendix. The upper bound is actually known in the context 
of error-correcting codes [341 Ch. 9, thm. 23] and it appears the proof there also works in our 
case. But we point out that a very simple proof of it can be obtained just by applying theorem 
[7] and ([6]) to the distribution of S n and this proof even improves a little on the constant. ■ 



4.2 Tribes 

Let m be an integer and let n = m2 m and let m(n) be its inverse function. Tribes n is the 
following function: Let the input bits be divided into 2 m sets of size m each, called tribes. 
Let yi be the AND of the bits in the i-th tribe. Then Tribes n is the OR of the y^s. Let 
S n = J2o<i<2 m Hi- Then Tribes n = iff S n = 0. Under Pi, S n — > Poisson(l). It is easily 
checked that Tribes is a sequence of monotone functions with sharp threshold at p c = 1 /2 and 

Pi(Tribes n = 1) -)■ 1- lie. 

2 

Theorem 11 For some C > 0, Cm[n) < K\ (Tribes) < 2m(n) 
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Proof. The proof is similar to that of proposition [51 First, notice that for Q G A(n, m(n),p) we 
have Q(y« = 1) = p m . For p < 1/2, a union bound now yields Q(maxo<j<2 m 2/i = 1) < (2p) m — > 
0. If Q G A(n, 2m(n),p) then the y^s are pairwise independent. Using Chebyshev's inequality 
on S n yields the desired result for p > 1/2. 

For the lower bound we use equation O to produce a Q G ,A(rt, Cm(n),p) such that the 
probability of all bits of any tribe are 1 is 0. ■ 

Theorem 12 ^(Tribes) < oj(m(n)) = u;(log(n)) 

Proof. This is like the proof of There is no need to normalize S n as it tends to Poisson(l) 
as is. Again, we only need to check that Poisson distribution satisfies Carleman's condition. ■ 
A more refined results, like those for Maj can be reached using Theorem [71 

Theorem 13 e TriheSn (km(n) , 1/2) < 
4.3 AC functions 

AC is the class of functions computable by boolean circuits using Not gates, a polynomial 
number of AND and OR gates (with unlimited fan-in) and of bounded depth. Tribes is a 
notable example of an AC function of depth 2. Linial and Nisan ([30]) conjectured that any 
boolean circuit of depth d and size s has Ki D w(log d_1 s). 

We prove a very special case of this conjecture. Let n = 2 2m and let the input bits be divided 
into disjoint sets, Aj, consisting of m bits each. A function is paired if it is the OR of AND 
gates, each operating on the bits in exactly 2 of the A^s. A paired function is, in particular, an 
AC function of depth 2. 

Theorem 14 If f is paired then K2(f) < w(logn) 

proof sketch. Let S(f) be the number of satisfied AND gates in /. The crux of the proof 
is to trim / by removing some of the AND gates to produce a function /', which is (a) very 
close to / under any o;(logre)-wise independent distribution, and (b) S(f') under P p tends to a 
RV which is determined by its moments. ■ 



4.4 Majority of majorities 

Let m be an odd integer and let n = m? . Maj 2 is the following function: divide the input bits 
into m disjoint sets of size m. Let yi be the majority of the i-th set, then Maj 2 is the majority 
of the yi's. 

Theorem 15 Ki(Maj 2 ) = 2 

Proof. The proof of [8J yields Q(y, = 1) < l/(m(l - 2p) 2 ) for any Q G A(n,k,p), whenp < 1/2. 
Therefore E Q (£ i y l ) < 1/(1 - 2p) 2 . The 2/j's are not pairwise independent, but Markov's 
inequality is enoug h: Q(Maj 2 = 1) < 2/(m(l - 2p) 2 ) -+ 0. ■ 

Notice that this proof applies also to i-levels majority, Maj*, defined similarly on m l bits. 

Theorem 16 ^fn < K2 < ui(y/n) 

Proof. To show that any function k G cj(y/n) belongs to ^(Maj 2 ) notice that if Q G 
A(n, k, 1/2) then the distribution generated on the y^s belongs to A(n, k/m, 1/2). Since the 
Ui's enter majority to produce the output, it is enough, by theorem [9] to have k/m = w(l) in 
order for the output to tend to 1/2. 
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To show that k = m — 1 is not in K2, let Q be the following distribution: Q is XORO on each 
Ai and completely independent on different A^s. Obviously, Q G A(n,m— 1, 1/2). By theorem 
HQJ Q(yi = 1) > 1/2 + l/3Vm (assume WLOG that (n + l)/2 is even)._Let S n = YT^ Vi 
and S n = (2S n — m)/y/m. Since the y^s are independent we have that S n — > N(a, 1) where 
a = lim(2Q(y» = 1) - l)\/m > 2/3. Obviously, Q(Maj 2 = 1) = Q(S^ > 0) is bounded away 
from 1/2. ■ 

The surprising fact here is the lower bound of y/n. First it shows an example where K2 
is much larger then u(Ki). Second, it demonstrates that L2 approximation does not imply 
"Sandwich L\ approximation (see section [3~T|) . 

4.5 Composition of functions 

Maj 2 is a simple example of composition of functions. What can we say about compositions in 
general? 

Let n = ml and let / = g(h±, .., h m ) where the /tj's receive disjoint sets Ai of I bits each. 
Assume that h^s are balanced with respect to p c and that p c (g) = 1/2. 

Theorem 17 For e < k? (4me,p c ) < Yi k hi (^,Pc) 

Proof. g(yi, ■■,y m ) can be expressed as a sum of monomials of the form \\yi n(l — Vj)i each 
involving all of the y's. We take the upper and lower "sandwich L\ approximating polynomials 
of each hi (which have degree k hi (e,p c )) and plug the upper in place of any yi and one minus 
the lower in place of any (1 —yj)- This produces a polynomial of degree k = Yi k hi {£,p c ) which 
bounds / from above. The error of each monomial, when the distribution is A;-wise independent 
is at most (1/2 + e) m — l/2 m < me/2 m ~ 2 for e < g— . Summing over the monomials we have 
an error of no more then 4me. The lower bound is similar. ■ 

This is a very general bound - we did not put any restriction on g, it can even be nonmono- 
tone. For example, K2 for the XOR of two (or boundedly many) majorities is still w(l). 

For < a < 1, define Maj 2 to be the majority of n a majorities of n l ~ a bits each. It is easy 
to see that K2 < w(n 1_a ). Theorem 1171 gives a bound of K2 < co(n 3a ). However, using finer 
properties of the "sandwich Li" approximating polynomials of Maj, we can do better. 

Theorem 18 K 2 (Maj 2 ) = w ( n min(a,i-a)) 

Proof. We use the approximating polynomials of the upper Maj function (the "<?") instead of 
the generic polynomial of theorem [ITJ These are not only of bounded degree, but also have 
small coefficients. This implies that the resulting polynomial is of degree 0(m) and produces 
an error of 0(n a l 2 e), where m is the degree of the approximating polynomial of the lower Maj 
functions and e is their error. Taking m = n a gives e = l/\/n°- = n~ a / 2 , as required. ■ 

4.6 Percolation 

Another very interesting example to consider is that of percolation. Briefly, percolation on 
a graph G = (V,E) is a distribution on {0,1}^, where we identify the bits with the states 
{open, close}. We refer the reader to |20j for details of the theory of percolation. We denote 
the set of all fc-wise independent percolation with marginal probability p for every vertex to be 
open by A(G, k,p). When G is infinite, we are interested in the probability of existence of an 
infinite cluster of open vertices. This event is a boolean function on infinitely many bits. 

Theorem 19 For G = Z d or G = T d (the d-ary tree), for any < p < 1 and any k there exist 
a Q £ A(G,k,p) such that there is an infinite open cluster Q-almost surely, and another such 
Q with no infinite open cluster Q-almost surely. 
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The positive part of this theorem follows from the following 2 theorems about finite versions 
of percolation. Let / be the function indicating an open crossing of the n x n grid. 

Theorem 20 2^ lo ^ n = (log n) 1 /^ log n < < a; ( logn ) 

For the tree case, we need to diverge slightly from the boolean valued setting. Let / be the 
number of open paths from the root to the leaves of T^, the n-levels d-ary tree. 

Theorem 21 For any p, for k = Clogn, there is a Q G -4,(T^, k,p) such that Eq(f) > 2. 

To this end, we present a way of combining /c-wise independent distributions to "amplify" 
the amount of independence, inspired by the (u \ u + v) lemma of error-correcting codes. Let 
Z r be the cyclic group of size r. Let A r (n, k) be the set of all fc-wise independent distributions 
on vectors (X\, . . . ,X n ) G with each Xi uniform in Z r . Define A r (G, k) similarly. 

Lemma 22 Fix m > 1. Let X := (X 1 , . . . , X n ) G A r {n, k). Let X i := (A])™ =1 be m IID copies 
of X. Let also Y := (Yi, . . . ,Y n ) G A r (n,2k + 1) be a vector independent of all the X's. Then 
the vector with the following coordinates 

X\ + Y u X\+Y 2 , Xi + Yn, 

Xl + Y u Xl + Y 2 , Xl + Y n , 

.... ( 8 ) 

X? + Yi, X? + Y 2 , X™ + Y n 

is in A r (mn, 2k + 1) 

Consult the appendix for a proof of a more general result. 

Proof. (Sketch, of theorem) We build distributions in ^4 r (T^, k) such that when we identify 
with open and the rest with close, we get the desired percolation for p = 1/r. 

The proof goes by induction. For k = 1 (i.e. no independence) a suitable distribution is just 
taking Xi to be identical and n to be large enough. 

Assume we have X G A r (T^,k) such that Ex(f) > 2. We will construct a suitable Z G 
A r (T'J n ,2k + 1) for m = n + n 2 k\ogd. Let X 1 be independent copies of X and let Y G 
A q (n,2k + l)be such that probability of Y = is maximal, which is roughly d~ nk , because 
there are about d n RVs in Y. Using lemma[22]we now assign the RVs in X % -\-Y l to the vertices 
of such that each is assigned to a subtree of depth n with root at a level divisible by n. 
Thus, with probability d~ nk we have Y = and then the open paths form a Galton- Watson tree 
with an expectation of 2 m l n = 2 1+nklo z d . Thus, the total expectation is a]- nk 2 l+nklo ^ d = 2. m 

Notice that having an open path from the root to the leaves of is an AC function of 
depth 2. This is an example of a rather complicated depth 2 function, very different then the 
paired functions considered in section FOl Also, this function does not exhibit a sharp threshold, 
thus the need for different terminology. 

4.7 Fourier transform and K 2 

In this section we consider only the case p = ^. The quantity Eq(f) may be represented using 
the fourier transform as ^2f(S)Q(S). When we consider / as having values of ±1 we have 
^/ 2 (<S) = S/ 2 ('5) = 2 n - Therefore f 2 (S)/2 n is a probability measure on all subsets of the 
bits, called the Fourier mass. Now, a distribution is fc-wise independent if and only if all of its 
Fourier coefficients of levels between 1 and k (inclusive) are 0. Therefore, if the fourier mass of / 



S 



is supported by the first k levels, then Eq(/) = /(0) = Pi (/). One might conjecture that if most 
of the Fourier mass is on the first k levels then Eq(J) would be small for all Q G A(n, k, 1/2). 

In [29] and |22j . Linial, Mansour and Nisan with an improvement by Hastad prove that 
any AC function has its fourier mass concentrated on the first 0(log d_1 s) levels (where s is 
the size and d the depth). Had the above conjecture been true, we would have proved that 
if 2 = a;(log rf_1 s) immediately for any such ACO function (see section ETB"]) . 

However, Maj 2 provides a counterexample for this conjecture, as its K2 > \fn while its 
fourier mass is concentrated on the bounded levels, i.e, for any e > there exists C > such 
that all but e of the mass is below level C. This is because Maj 2 is a composition of noise stable 
functions and is therefore noise stable itself (see [7]). Of course, Maj 2 is not an AC function 
so this conjecture might still be true in that domain. 



M(n,k,p) < 2\fk ( — — 7— ] For any n, even k and p (10) 



4.8 Maximal probability that all bits are 1 

In this section we investigate the maximal probability that all the bits are 1, i.e, the AND 
function. At the end of the section two applications of our bounds are given. 
Define M(n, k,p) := maxQ g _4( n Q(A11 bits are 1) then 

Theorem 23 For even k 

M(n, k,p) < f (9) 

P(Bm(n, 1 — p) < |) 

Proof. Fix Q € A(n, k,p), let S count the number of bits which are 1. Since S has the same 
first k moments as a Bin(n,p) the result follows immediately from theorem [7J and §5§ applied 
to S. m 

Bound ([!]) is a powerful bound in that it seems to give good results for most ranges of the 
parameters. Here are some corollaries 

Corollary 24 

(kp 
2e (i _„)(„_*) 

k 

M(n,k,p) < 10p n k even, n(l — p) < — (11) 

We add that it is possible to get a result similar to (fT0|) by letting S count the number of bits 
which are 1, considering (S —pn) k and applying Chebyshev's inequality. Still our approach with 
theorem [7] has the following advantages. First, it is quite simple as the above proof of theorem 
[53] shows. Second, it gives (|10|) in all ranges of the parameters n, k and p, estimating K(S — pn) k 
appears to become difficult when k also grows with n, or when np is small. Third, it seems to 
give slightly better results, the approach with Chebyshev's inequality apparently does not give 
the factor 2 inside the brackets of (|10p . 

We are also able to obtain exact results, for k = 2, 3. This is done by adapting the closed- 
form expressions appearing in Boros and Prekopa [9] to our settings. 

Proposition 25 Let M := [(n — 1)(1 — p)\ and 5 := {(n — 1)(1 — p)} (integer and fractional 
parts respectively). And also N := \_(n — 2)(1 — p)\ and e := {(n — 2)(1 — p)} then 
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For lower bounds on M(n, k,p) and an exact result for small p, see the appendix. 

We present two applications of our bounds. First a definition, for q a prime-power and 
k > 2, a matrix B G Mn xn (GF(q)) is an OA(n,k,q), or an Orthogonal array of strength k with 
q levels (see [3l] and [23]) if a uniformly chosen row (X%, . . . ,X n ) of it has A;- wise independent 
entries, each uniform in GF(q). If the rows of B form a linear subspace, then B is called a 
linear orthogonal array and is referred to by its generator matrix A G M m ^ n whose rows are a 
basis for the rows of B. We call A a GOA(n,k,q) for short. 

1. The bound © can be used to give another proof of the Rao bound (see [23]) on the 
minimal size of orthogonal arrays over GF(q). To see this, suppose B is an OA(n,k,q) 
for k even, with R rows. We may assume B contains the all zeroes vector. Consider the 
distribution Q G A(n, k, -) obtained by sampling uniformly a row of B and mapping each 

coordinate to a bit by i— > 1, other elements to 0. We have Q(A11 ones vector) = j=r, hence 
by ([9j) we now get R > q n F(Bm(n, 1 — |) < -|) which is the Rao bound, or using the less 

k 

refined flEJ we get R > ( 2efa ~^ (w ~^ ) " /2>/jfe. 



We mention in this context that for q = 2, this lower bound is equal to the bound 



k 



m(n, k) := X^ 2 =o ( U j wn i cn a l so appeared in [2J (in a more general setting) but we note 

that for q = 2 we obtain a somewhat stronger result, the bound ([!]) is in fact an upper 
bound for the size of any atom of the distribution (by xoring a constant vector), hence 
for this case we improve slightly the known results by adding that the maximum atom of 
the distribution is bounded by m ^ n o , not just the size of the sample space. 

2. Let A be a GOA(n,3,3) with m rows. Since the columns of A are 3-wise linearly indepen- 
dent, a theorem of Meshulam [3S] implies that n = 0(^—). Consider the distribution Q in 
A(n, 3, ^) obtained by sampling a uniform linear combination of the rows of A and map- 
ping to bits by, say, i — >■ 1 and 1,2 h-> 0. We have Q(AH ones vector) < ^ = 0( n ^ ). In 
contrast, by equation (fT3j) there exist Q' G A(n, 3, |) with Q'(A11 ones vector) = fi(^). We 
deduce that there is an asymptotic difference between what distributions obtained from 
linear orthogonal arrays (by the above method) and general distributions can achieve. 
This is interesting since most explicit constructions of /c-wise independent distributions 
seem to be based on sampling from linear orthogonal arrays. 

4.9 Minimal probability that all bits are 1 

Define m(n, k,p) := m^Q^A(n,k,p) Q(AU bits are 1). m(n, k,p) can well be 0, in fact 
Proposition 26 When k < n and p < \ we have m(n, k,p) = 0. 

Since for p = | we can take the XOR0, or XOR1 distributions according to the parity of n. And 
for lower p's we can take the AND of this distribution with a fully independent distribution. 

For p > 4, define n c (k,p) := min{n | m(n,k,p) = 0}. Our main result of the section are 
two sided bounds on n c (k,p) 

Theorem 27 For any p>\ 

+ 1 k even 
k odd 



n c (k,p)>\ 2 ^ , rr (14) 



2(l-p) 
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and when 1 — p = - for a prime-power q and C > is a large constant 



n c {k,p) < C 



k 



log( 



1 



) 



(15) 



1-p 



1-p 



The upper bound is based on the Gilbert- Varshamov bound of error-correcting codes (see [34| ) 
and one extra idea. The lower bound poses the main difficulty and for it we need a different 
aspect of the TCMP. Fix k and p > \ and let Q G A(n c (k,p), k,p) satisfy Q(A11 ones vector) = 
0. Let, as usual, S count the number of l's. S is supported on [0, n — 1] and has the first k 
moments of a Bin(n,p). Theorems in the TCMP show this is only possible if n c satisfies (|14|) . 
The actual verification is technical and involves calculating determinants. Consult the appendix 
for more details. 

As in the previous section, one can deduce from this a result about orthogonal arrays. 
Suppose B is an OA(n,k,q) with the property that each row contains the symbol 0. If k is even, 
then necessarily n > 1 + ^ and if k is odd then n > ( k+ ^ q . 

5 Open questions 

Below we list some of our main open questions: 

1. Say anything non-trivial about the extremal points of A(n, k,p). 

2. Is "sandwich L\ approximation stronger then L 2 approximation? 

3. What is K 2 (Maj 3 )? What is K 2 (M&f)? 

4. What is K 2 for iterated majority of threes? 
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6 Appendix 

6.1 K\ and K 2 

It is not a-priori clear whether one of these classes contains the other. Assume that limP Pc (/ = 
1) exists and denote it by a. We have the following simple result: 

Claim 28 For any k 6 Kz{f), for p < p c we have lim e* (k,p) < a and for p > p c we have 
lim e* (k,p) < 1 — a 

Proof. Obviously, both maxQ 6 ^ (n fciP ) Q(/ = 1) and rmn QeA ^ njk ^ Q(f = 1) are increasing 
functions of p. The claim now follows immediately from the fact that lim maxQ e _4( n fc p ) Q(/ = 

1) = lunmm QeA ( n ,k,p) W = !) = «• ■ 

So, while we don't know if for k G K 2 , (k,p) — > we do know that it cannot be too large. 

6.2 Percolation 

Here is the more general result, of which lemma [22] is a corollary (put 1 = 1). 

Lemma 29 (combining distributions) Fix integers l,m > 1. Suppose for each 1 < i < m we 
have random vectors := (X{, . . . , X^) £ A r (n, k) and Y { := (Yf, . . . ,Y%) £ A r (n, lk + l + k). 
Suppose that the X vectors are independent among themselves and independent from the Y 
vectors, and that the Y vectors are l-wise independent among themselves. Then the vector with 
the following coordinates 

X\ + Y x , X\ + Y 2 X , 
Xl + Yl Xl + Yl 

X m -\- Y m X m + Y m 

is in A r (mn, Ik + 1 + k) 
Proof. 

Call the resulting distribution Z, where Z* := (Z\,Z l 2 , . . . , Z l n ) and Zj := X) + Yj. Take a 
set S of at most Ik + I + k variables from the vector Z, we need to show they are independent 
and uniformly distributed. Suppose that a 1 of them are from Z % for each 1 < % < m, WLOG 
we can assume that for each i these are Z\, . . . , Z l i . Consider only the i's for which a 1 > k + 1, 
since \S\ < l(k + 1) + k we can have at most I such i's, WLOG suppose these are a , . . . , a* 
for t < I. Now, fix some values c*- G Z r for 1 < i < m, 1 < j < aj, define events A := {Z % - = 



Vl 1 yl 

n 1 



X n + Y, 



ym 1 



(16) 
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cj- for all 1 < i < t and 1 < j < aj} and B := {Z % - = dj for all t + 1 < i < m and 1 < j < Oj}. 
We need to show that ¥(A,B) = r~\ s K We start with 

P(A) =E(F(A | {X l )\ =l )) = 

= E(P(Y/ = c) - X) for 1 < i < t, 1 < j < a 1 | PO* =1 )) = (17) 
= E(r~ ^<=i a ' ) = r _ ^<=i a ' 

Where the next to last equality follows since the Y- are uniform and independent from the 
X's, since they are /-wise independent as vectors (and I > t), since they are (Ik + I + /c)-wise 
independent inside each vector and since a*- < |5| < Ik + Z + k. 

To finish the lemma we need to show that P(B \ A) = r~ ^<=*+i a . We will show something 
stronger, that in fact ¥(B \ (X*)* =1 , {Yi)™ =l ) = r~^^ a \ Indeed 

f(b | (jt)U (ro^i) = 

= P(Xj = cj - r/ for t + 1 < * < m, 1 < j < a' \ (X l )j =1 , {Y l )T =1 ) = (18) 

Where the last equality follows since the X 1 - for t + 1 < i < m are uniform, independent from 
the Y's and from (X l )\ =1 , since a 1 < k for each t + 1 < i < m by the definition of t and since 
(Xj)™ =1 are /c-wise independent. This finishes the proof of the lemma. ■ 

6.3 The classical moment problem 

Here is the general setup of the classical moment problem (p], [28]) leading to the bounds of 
theorem [7J It is followed by a definition of the Krawtchouk polynomials and a proof of §5§ and 
®. 

Consider a real sequence S := {s m }^ =0 , with sq = 1 (this last condition is convenient for 
us in order to use probabilistic notation, but it is not necessary for the results of the classical 
moment problem). Define 

A s = {Q | Q a probability distribution on R, s m = E Q (X m ) for < m < k} (19) 

to be all probability distributions with these first k moments (X is a random variable distributed 
according to Q). 

Definition 30 Given S = {s m }^ =0 with sq = 1 and k even, define the orthogonal polynomials 
with respect to S, {Pm} 1 ^^ as the unique polynomials with the following properties: 

1. P m is a polynomial of degree m with positive leading coefficient. 

2. Defining formally a linear operator T from polynomials of degree k to reals by T{x l ) := Si 
forO<i<k then T(Pi(x)P m (x)) = 5 ltTn . 

Note that the second condition is the same as requiring EQ(Pi(X)P m (X)) = 5i m for any Q £ 
As- 

We remark that these polynomials cannot be defined for degree larger than n if the sequence S 
corresponds to the moments of an atomic distribution with only n atoms. 

Define also the function p n (x) := (ELo^iW) -1 ! then we have the following 
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Theorem 31 [1, 2.5.2 and 2.5.4] For an V x and any Qi,Q 2 £ As 



\Qi(X <x)-Q 2 (X <x)\< P k{x) (20) 

2 

and in particular when Qj = Q2 

max Q(X = x) < p k (x) (21) 
QeAs 2 

We remark that in many cases, the theory also has constructions which achieve these bounds, but 
we could not use these since in the cases we needed we required the support of the distribution 
to be on integer points. It is possible, however, that a modification of these constructions can 
yield a distribution on integer points, this would be very useful to show the sharpness of the 
bounds in the cases we use. 



6.3.1 Krawtchouk polynomials 

In our work we utilize the orthogonal polynomials corresponding to the moments of the bino- 
mial distribution (i.e., when s m = E(X m ) where X ~ Bin(n,p)). These are the well-known 
Krawtchouk polynomials (see [13] )• For given n and p, the m'th polynomial (0 < m < n) is 
given by 



Pm{x) 



j=0 



n — x \ x 



m-jl \ j 



P m - j (l-p) j (22) 



where for real x and integer b > 1, 
We note that 



x(x-l)—(x-b+l) 



b '" 



61 



and 



:= 1. 



Pm\n) 



1 — p 

p 



Hence 



Pm{n) = 
Furthermore, for p = \ 



E 



n\ fl—p 



p" 



P(Bin(n, 1 - p) < m) 



(23) 



(24) 



3=0 



«'-3 [ " 2 \ ( 

m-jj \ j 



(25) 



but, as is well known, since the sum is the coefficient of z m in the power series expansion of 
f(z) := (1 + z) 2 (1 — z) a and since f{z) = (1 — z 2 )? we get by the binomial formula that 







n 

m. 



"1) 



n/2 N 
m/2. 



m odd 



m even 



(26) 



We then obtain 

Lemma 32 For p = \, even n and even m < 



,n. 2 



(27) 
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Proof. Using (|2f)j) we have 



M=) = ( E (J) 



n \ l n/2 



J 



(28) 



we recall the well-known inequality that for any integer a > 0, a! = v / 2ro(f) a e Atl where 
l2a+i — ^ a — 12a ' Using this we notice that 

a a a \ \ \ 



2vr6(a - b) (a - b) a - b b h 



Hence after cancelation 



\ - 1 / /o> 2 
n \ / n/2 



2jy V J / V 7r ( n - 2 i)i 



so for §, j,(§ - j) > 1 we get 



n \ 1 / n/2 
2j 



Plugging back into ([28]) we get 



> 



-i 



n __l / 1 

— . — e 12 > w — 

7r(n - 2j)j V 8j 



Mi), 1 + M <i + ^(Vf->))' 



< 



(29) 



(30) 



(31) 



(32) 



6.4 Majority 

We now continue and give a sketch of the proof of the lower bound for theorem [TU1 

Proof, (sketch of lower bound in theorem 1 10p Fix an odd n and a 2 < k < n, we would like 
to construct a distribution Q G A(n, k, |) such that when we define 5 to be the number of bits 
which are 1 when sampling from Q then 

^T'-^OT (33) 

for some C > 0. We may assume that k < Cj^-^ for some small c > 0, otherwise the bound 
follows trivially by taking the distribution XOR0 and using the bound that it satisfies (see 
theorem [TU1 Let M := C ^1 k t " g g be an integer, the idea of the proof is to construct Q in 

such a way that with high probability S = L mod M for some fixed integer < L < M, and 
furthermore that on this event S behaves like a Bin(n, ^) random variable conditioned to be L 
mod M. Such an 5* will satisfy (|33p for the correct choice of L. 

To do this, we consider a distribution Q on (X±, . . . , Xk+i) G ^a/ 1 satisfying that all the X,- t 
are IID uniform in TLm except that is chosen so that their sum is always L modulo M. This 
distribution is of course fc-wise independent, the required distribution Q is a distribution on n 
bits (Y~i, . . . , Y n ), we create it from the distribution Q by dividing the Y's into k + 1 disjoint 
groups of bits, each Xi is responsible for the value of one of these groups of bits in the following 
way, when observing the value of the X variable, we sample as uniformly as is possible a string 
of bits for the Y variables in its group such that their sum modulo M equals the value of the 
X variable. 

The parameters have been chosen in such a way so that the probability that we do not 
succeed even at one of the Y groups to have the correct sum modulo M is very small. Hence 
the distribution Q thus constructed satisfies the required properties. ■ 
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6.5 More bounds on the maximal probability that all bits are 1 

In this section we detail more bounds on the maximial probability that all bits are 1. Recall 
M(n, k,p) := ma,x Q( , A ^ k , p ) Q(A11 bits are 1). 
In the main text we have shown 

Theorem 33 For even k 

M(n, k,p) < f (34) 

P(Bm(n, 1 — p) < |) 

In particular for even k 



M(n, k,p) < 2Vk V - r- (35) 

and for even k and n(l — p) < | 

M(n, k,p) < Wp n (36) 

We now compliment these with lower bounds on M(n, k,p). Both lower bounds come from 
well-known constructions of linear error-correcting codes. In both we assume p = - for either 
a prime, or a prime-power, q. To get the bounds we first construct the linear code over GF(q), 
then pass to its dual code, well known to be an orthogonal array. Then sample a line of the 
orthogonal array uniformly and map to bits using i — > 1 and the rest of the elements mapping 
to 0. We obtain 

Theorem 34 Using the Gilbert- Varshamov bound, for p = - with q a prime power 

M(n, k,p)>p ( p ^ k ~^ \ (37) 
\ en J 

and using BCH codes, when p = - with q a prime (not a prime-power!), k = l(mod q) and 
n + 1 is a power of q then 

1 N (fc-l)(l- P ) 



M(n,k,p)>p[——) (38) 

We add that there is a gap in the exponent between these lower bounds and our upper bounds, 
namely the upper bounds have exponent | and the lower bounds have, at best, exponent 
{k — 1)(1 — p). We do not know to close this gap but remark that it is also present in the 
theory of error-correcting codes, for a paper discussing this gap for error-correcting codes and 
the known results there see |14j . 

We end this section by remarking on one more exact result, for very small p's 

Proposition 35 When p < 

M(n,k,p)=p k (39) 

This follows quite simply from a direct construction of the distribution. We start by putting 
probability p k on the all ones vector, then all the rest of the probabilities of atoms are determined 
by being fc-wise independent with marginal p, we check that for this range of p's all these other 
probabilities are indeed positive. This is the same as the fact that the weight distribution of an 
MDS code is determined, see 
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6.6 More on the minimal probability that all bits are 1 

We remark on the proof of theorem 1271 The construction of the upper bound on n c (k,p) goes as 
follows, we start with an orthogonal array with very good parameters over GF(q) (where now 
q = jz^) obtained using the Gilbert- Varshamov bound. We then choose a row uniformly and 
map each of its coordinates to bits. The mapping is chosen so that in each coordinate exactly 
one element of GF(q) is mapped to 0, the rest to 1, but this element is chosen in a greedy 
fashion to minimize the chance of not having a anywhere. When n is large enough compared 
to k this idea succeeds in giving a distribution with probability for the all ones vector. This 
gives the upper bound of the theorem. 

As detailed in the main text, the lower bound follows from an existence theorem in the 
theory of the TCMP, we give this theorem here for easy reference. 

Let X ~ Bin(n,p) and define Sj := E(X*). Define the matrices 



then the classical moment problem states (see [TJ,[28j or [llj which contains a survey) 

Proposition 36 A random variable S with moment sequence {sj} supported on [a,b] exists if 
and only if 



A(m) 
B(m) 
C(m) 




i,j=0 



(40) 



1. k is odd and bA{*=±) > B(^) > aA(^). 

2. k is even, A{\) > and (a + 6)£(§ - 1) > abA{± - 1) + C(|). 



where as usual, A> B means A — B > means that A — B is non-negative definite. 
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